I/O-optimal Algorithms for Orthogonal Problems for Private-Cache Chip Multiprocessors
نویسندگان
چکیده
The parallel external memory (PEM) model has been used as a basis for the design and analysis of a wide range of algorithms for the private-cache multi-core architectures. Recently a parallel version of the distribution sweeping framework was introduced to efficiently solve a number of orthogonal geometric problems in the PEM model. In this paper we improve the framework to the optimal O(sortP (N)+K/PB) I/Os, where P is the number of cores/processors, B is the number of elements that fit into a cache-line, N and K are the sizes of the input and output, respectively, and sortP (N) denotes the I/O complexity of sorting N items on a P -processor PEM model. We achieve this with a new one-dimensional batched range counting algorithm on a sorted list of ranges and points that achieves O((N + K)/PB) I/O complexity, where K is the sum of counts of all the ranges. The key to achieving efficient load balancing among the processors for this problem is a new method to count the output without enumerating it, which might be of independent interest. Keywords-parallel external memory, PEM, multicore algorithms, computational geometry, parallel distribution sweeping
منابع مشابه
Geometric Algorithms for Private-Cache Chip Multiprocessors
We study techniques for obtaining efficient algorithms for geometric problems on private-cache chip multiprocessors. We show how to obtain optimal algorithms for interval stabbing counting, 1-D range counting, weighted 2-D dominance counting, and for computing 3-D maxima, 2-D lower envelopes, and 2-D convex hulls. These results are obtained by analyzing adaptations of either the PEM merge sort ...
متن کاملA Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private L2 Caches
For high-performance chip multiprocessors (CMPs) to achieve their maximum performance potential, an efficient support for memory hierarchy is important. Since off-chip accesses require a long latency, high-performance CMPs are typically based on multiple levels of on-chip cache memories. For example, most current CMPs support two levels of on-chip caches. While the L1 cache architecture of thes...
متن کاملUtilization of Cache Area in On-Chip Multiprocessor
On-chip multiprocessor can be an alternative to the wide-issue superscalar processor approach which is currently the mainstream to exploit the increasing number of transistors on a silicon chip. Utilization of the cache, especially for the remote data is important in the system using such on-chip multiprocessors since the ratio of the oo-chip and the on-chip memory access latencies is higher th...
متن کاملOptimal Placement of Cores, Caches and Memory Controllers in On-Chip Network
Parallel programming is emerging fast and intensive applications need more resources, so there is a huge demand for on-chip multiprocessors. Accessing L1 caches beside the cores are the fastest after registers but the size of private caches cannot increase because of design, cost and technology limits. Then split I-cache and D-cache are used with shared LLC (last level cache). For a unified sha...
متن کاملOptimal Placement of Cores, Caches and Memory Controllers in Network On-Chip
Parallel programming is emerging fast and intensive applications need more resources, so there is a huge demand for on-chip multiprocessors. Accessing L1 caches beside the cores are the fastest after registers but the size of private caches cannot increase because of design, cost and technology limits. Then split I-cache and D-cache are used with shared LLC (last level cache). For a unified sha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010